Flexible CSV Importer

Introduction

This plugin was conceived to bulk import new Individual records and their basic facts either into an empty Project or an established Project where the new individuals are unrelated to existing individuals. It has evolved to support more than just basic facts and to enable synchronization with existing Individual records and their facts in established Projects.

The plugin imports a CSV file, typically with one row per Individual record, and adds people, facts and simple relationships to a Project. It provides a dialogue to map columns to fields.

It assumes that column headings follow the conventions set out below. If a valid data reference is not found, the plugin will attempt to construct one. Simple terms like DOB, Place of Birth or Spouse's father should be interpreted correctly.

The advice is quite technical because the data is aligned with the underlying FH/GEDCOM database structure. For example, the CSV column headings are similar to the FH Data References used within FH Expressions. So it is advantageous to become familiar with the FH Data Reference features. e.g. A birth date has the Data Reference INDI.BIRT.DATE and the equivalent CSV column would be P.BIRT.DATE for the Primary Individual or S.BIRT.DATE for their Spouse.

CSV Conventions
Family Relationships
Individual Synchronization
Fact Synchronization
Source Citations
Sample Plugin Demonstration Data
Running the Plugin

CSV Conventions

These conventions apply to the CSV file column headings and the data in the cells.

Individual records are identified with abbreviations, e.g. P for Primary, S for Spouse, PF for Primary’s Father, SM for Spouse’s Mother, etc.

Standard facts are imported by using their GEDCOM field tags, e.g. NAME, BIRT, MARR, DEAT, CENS.
Custom facts can be imported by entering the appropriate data reference tag, e.g. EVEN-CUSTOM_FACT or _ATTR-CUSTOM_FACT.
Hint: Run an Individual Query and use its Columns tab Fields pane to determine the data reference tag for any custom fact.
They are supplemented with detail tags, e.g. DATE, PLAC, ADDR, AGE, NOTE, TYPE.

Unique references (using the REFN tag) are used to identify the same individual throughout the data. The REFN cells must have a value but do not need to be numeric, so schemes like I1, I2, I3 or F01, F02, M01, M02 or even father, mother, child1 could be used providing that they are all unique. See the Family Relationships, Multiple Partnerships and Individual Synchronization sections below for typical uses.

Whilst the plugin is reasonably intelligent, field mapping using these Conventions in the CSV file itself is more reliable, which is the approach in the Sample Plugin Demonstration Data below.

Most conventional column headings have the format RECORD.FIELD.DETAIL but there are exceptional Special Cases as explained below.

Where…

RECORD identifies the Individual or Family record:

P = Primary ~ the principal person of interest in the record
PF = Primary’s father
PM = Primary’s mother
S = Spouse ~ of the primary person
SF = Spouse’s father
SM = Spouse’s mother

Family identifiers use the two individual abbreviations separated by an underscore:

P_S = the Primary person and their Spouse
PF_PM = Parent family of the Primary person
SF_SM = Parent family of the Spouse

FIELD is the GEDCOM tag or label for the fact, e.g. BIRT, Birth, DEAT, Death, CENS, Census, NAME, Name, REFN, EVEN-CUSTOM_FACT.

DETAIL is the GEDCOM tag for the detail of the field, e.g. DATE, Date, When, PLAC, Place, ADDR, Address, NOTE, Note, TYPE, Type.

For example, the primary person’s birth-place is the P.BIRT.PLAC column. PLAC and ADDR or other columns that contain commas must enclose cell values in ʺstring quotesʺ to hide the commas. That formatting is performed automatically by the spreadsheet application (Excel, LibreOffice, etc.).

The marriage date of a primary person’s parents is the PF_PM.MARR.DATE column. However, it is usually better to specify this fact against a parents’ own record (row), as illustrated below, rather than in the child record.

This is a very simple example of CSV data for a wife, husband, and his parents providing their birth and marriage dates.

Simple Example CSV

Special Cases

The primary REFN column heading can omit the P prefix. None of the REFN headings has a DETAIL part.

NAME fields are assumed to be full names, i.e. Given Names Surname, unless followed by a colon and qualifier GIVEN, SURNAME or SURNAME_FIRST, e.g. P.NAME:GIVEN column. A separate P.NAME:SURNAME column is essential if any surnames contain spaces such as Van De Burg. Otherwise, the last word Burg is assumed to be the surname.

Age at the time of a family event (e.g. marriage) is indicated by a colon and the individual’s identifier. For example, P_S.MARR.Age:P and P_S.MARR.Age:S columns.

Multiple columns can be concatenated into the same field but the columns must be named in a logical manner and be unique, e.g. TEXT1, TEXT2 or NOTE-A, NOTE-B or P.NOTE-A, P.NOTE-B would be combined and imported into P.NOTE using the concatenation separator chosen in the Settings on the first tab. This also works on the address fields but it may be preferable to format the address elements as one column with any required commas in the spreadsheet / CSV file as the use of ‘newline’ may have undesirable effects in the formatting.

Family Relationships

The marriage or partnership relationships are mapped in the spouse S.REFN column.

A child is mapped to the parents using PF.REFN and PM.REFN columns.

Multiple Partnerships

When one person has multiple partnerships they need multiple rows so that their S.REFN column entries identify each spouse.
As long as both spouses cross-refer to each other via the S.REFN column, their family facts can be added using either row.

Individual Synchronization

The REFN values that identify the same individual throughout the data can also synchronize with the Custom ID of existing records in a Project. Use the FH Tools > Record Identifiers… option to copy Record Id to the Custom ID to sync with those numbers as the REFN values.
Alternatively, REFN integer values can be treated directly as Record Id as explained under Directions and Settings below.

Different CSV rows will use the same REFN value when multiple instances of the same type of fact such as Census or Occupation need to be entered.

Fact Synchronization

The objective here is to allow existing facts in the Project to be supplemented with details supplied by the CSV file.
For example, an existing Birth event with a Date will sync with a CSV Birth event with the same Date and gain Place, Address &/or Note details defined in the CSV columns.

The fact synchronization rules are:

The existing and the CSV individual REFN value and the fact name (BIRT, DEAT, MARR, etc) must match.
Fact attribute values are disregarded to allow multiple facts with the same attribute value to be added, e.g. Multiple Occupation Doctor facts.
Only subfield values for Date, Age, and short text such as Place and Cause can match. Long text such as Address and Note subfields are not involved.
Each such existing fact subfield with a value will match if an equivalent CSV subfield cell has the same value or no equivalent CSV subfield column exists at all.
e.g. Existing Occupation Date has a value and equals the P.OCCU.DATE cell value or no P.OCCU.DATE column exists.
The matching CSV cell(s) must be somewhere to the left of the new value cells for those values to be added to the synchronized fact.
e.g. Cell P.CENS.DATE of 6 Jun 1841 must be to left of P.CENS.NOTE cell to add notes to an existing 6 Jun 1841 Census fact.
Any unmatched subfield will gain or merge a value from an equivalent CSV cell value.
e.g. An empty Place subfield in a synchronized Census fact will gain a P.CENS.PLAC cell value.

A snag with those rules arises when multiple instances of the same fact need to be added but no synchronization subfield values exist.
For example, imagine for the same individual an Occupation attribute is defined on two CSV rows with no Place or Date or Age values.
The first row creates an Occupation in the Project. The second row synchronizes with that created fact via rules 4. & 5. because there are no subfields. So both Occupation attribute values merge into one comma separated list instead of creating a new Occupation fact.
The workaround is to enter unique dummy values in a CSV subfield column such as Place values Z, ZZ, etc, or Date years 9001, 9002, etc, just for those multiple facts.
Then after running the plugin, delete all Place records with those unique values, or use a Fact Query to find and delete those unique values.

Source Citations

Separate source citation details can be added for each line. Column headings for citation elements are of the form:

SOUR>TITL = Source Title
SOUR.DATE = Entry Date
SOUR.QUAY = Assessment ~ 1, 2, 3, 4 or P, S, Q, U
SOUR.PAGE = Where within Source
SOUR.TEXT = Text From Source
SOUR.NOTE = Citation Note

Without a prefix as above, they are added to every new item. With a RECORD prefix, they add a whole record citation. With a RECORD.FIELD prefix they add Name & Fact citations.

If the SOUR>TITLE matches the Source record Title chosen in the Settings for every item imported then the Citation is applied to that Source.

Sample Plugin Demonstration Data

You can download this sample data in CSV format to demonstrate some functions of the Plugin. It is strongly suggested that you run this plugin in a new, empty Project.

Download the flexible_CSV_importer_sample_import demonstration CSV file.

The data would normally be collated in a spreadsheet (Excel, LibreOffice, etc.). It is important to save the output file in CSV format.

For clarity, the column headings in the sample CSV file are as follows:

REFN, P.NAME, P.SEX, P.CHR.DATE, P.CHR.PLAC, P.CHR.SOUR>TITL, P.BURI.DATE, P.BURI.PLAC, P.BURI.NOTE1, P.BURI.NOTE2, P.BURI.SOUR>TITL, PF.REFN, PM.REFN, S.REFN, P_S.MARR.DATE, P_S.MARR.PLAC, P_S.MARR.ADDR1, P_S.MARR.Address2, P_S.MARR.SOUR>TITL, P_S.MARR.AGE:P, P_S.MARR.AGE:S, P.OCCU, P.OCCU.DATE, P.OCCU.PLAC, P.OCCU.SOUR>TITL, P.OCCU.SOUR.PAGE, P.OCCU.SOUR.TEXT1, P.OCCU.SOUR.TEXT2, P.OCCU.SOUR.DATE, P.OCCU.SOUR.NOTE, P.NOTE1, P.NOTE2, P.NAME.SOUR>TITL, P.NAME.SOUR.PAGE, P.NAME.SOUR.TEXT, P.NAME.SOUR.DATE, P.NAME.SOUR.NOTE1, P.NAME.SOUR.NOTE2, P.SOUR>TITL, P.SOUR.PAGE, P.SOUR.TEXT, P.SOUR.DATE, P.SOUR.QUAY, P.SOUR.NOTE1, P.SOUR.NOTE2, P.CENS.DATE, P.CENS.PLAC, P.CENS.ADDR, P.CENS.SOUR>TITL

These are only examples of input data columns. In the sample data, the Primary REFN = 1 and 2 are repeated to include additional facts.

Running the Plugin

It is advisable to experiment on an empty Project or a copy of an existing Project to understand how the plugin features behave. After running the plugin, and before closing the Project, the Edit > Undo Plugin Updates command will remove all the imported data.

Directions and Settings

When this dialogue appears, choose your import CSV file at Step 1 then proceed to Step 2 and Step 3.

If no source record is required and thus no citations, change the Add a source citation to every item imported option to No.

If an existing source record for citations is not desirable, set the Select existing source record for such citations option to No.

There is a choice of separator for multiple note and other text fields, e.g. comma, semicolon, space, newline or two newlines.

Normally each REFN value in the CSV file will be retained in the Custom ID field. If this is not required change the setting to No.

Normally each REFN value is synchronized with Custom ID fields in the Project. If this is not required change the setting to No.

Set the Treat any unsynced integer REFN as a Record ID option to Yes to synchronize REFN values with Record ID if the record has no Custom ID.

Individual Record Sync

There are essentially four modes of individual record synchronization. It is advisable to experiment on an empty Project or a copy of an existing Project to understand how synchronization behaves to avoid creating unwanted individuals or merging the wrong individuals. Check that the REFN Settings are set appropriately every time the plugin is run because they are NOT remembered from run to run.

Custom ID Sync

This is intended where creating new individual records is the objective. Each REFN value synchronizes with the same value elsewhere in any CSV files or in the Custom ID field of existing individual records in the Project. This allows multiple CSV files to add new data in phases. The default REFN Settings support this mode.

Record ID Sync

This is designed to allow CSV files to sync with existing individual records in the Project. Each REFN field must hold a Record ID integer. The simplest method is to use the FH Tools > Record Identifiers… option to copy all Record ID to the Custom ID and then use Custom ID Sync as described above. Alternatively, change the three REFN Settings to No, No, Yes respectively. Then each REFN integer will synchronize directly with each individual Record ID in the Project.

Hybrid Sync

This combines the features of both modes above. All three REFN Settings must be set to Yes. Where possible, Custom ID Sync will try to match with newly created individuals, otherwise Record ID Sync will attempt to match with existing individuals. It is recommended that REFN values for Custom ID Sync contain at least one non-numeric character, whereas REFN values for Record ID Sync must be appropriate integers.

Prevent Sync

This ensures that the CSV data is never synchronized with any existing individual records in the Project and the REFN values are NOT retained in the Custom ID fields. All three REFN Settings must be set to No.

Checking the Mapping

Field Interpretation

The Field Interpretation tab of the plugin is important.

Check this before running the actual Import to your Project at Step 4 of the Directions and Settings tab.

If anything cannot be mapped in the process it appears red in the first column and can be corrected either in the input file and re-imported or modified in the mapping table.

The field interpretation can also be used to re-map specific columns to alternative tags, e.g. birth to christening by selecting the dropdown arrows in the field label and detail label columns.

It is recommended that the data is “cleaned” ready for import in your spreadsheet before creating your CSV file rather than correcting mappings each time during import.
For example:

1. 1. Ensure Date formats look like (d)d/MMM/yyyy and use “text” format for the cells so the spreadsheet does not try to format the dates.
  2. Enter Notes into a suitable column and delete trailing spaces and special characters with the formula =TRIM(CLEAR(cell-reference)).
  3. Ensure Place names use a consistent format such as City, County, State, Country in a single cell.

The More Information tab provides a summary of the above advice.

More Information

The Updates? button will check if a later plugin version is in the Plugin Store, and the Help button will display this online page of advice.

Plugin Flexible CSV Importer

Help content on this page is owned and provided by Mike Tate, the plugin's author, Calico Pie takes no responsibility for its content.

Plugin Store

Plugins and Language Packs for Family Historian